Skip to content

Merge UKCEH/Turing regional classifiers (#72)#112

Open
mihow wants to merge 11 commits intomainfrom
ukceh-merge-pr72
Open

Merge UKCEH/Turing regional classifiers (#72)#112
mihow wants to merge 11 commits intomainfrom
ukceh-merge-pr72

Conversation

@mihow
Copy link
Collaborator

@mihow mihow commented Feb 17, 2026

Summary

  • Resolves merge conflicts from Add UKCEH/Turing models #72 (AMI-system:main) and merges the UKCEH/Turing regional species classifiers
  • Adds 6 new classifier pipelines: Japan, Kenya-Uganda, Madagascar, Singapore, Thailand, and Anguilla v02 (160 classes)
  • Fixes duplicate TuringKenyaUgandaSpeciesClassifier class definition from the upstream branch
  • Fixes double slashes in object store URLs
  • Uploads all missing model weights and category maps to the Arbutus object store

Model weights uploaded to object store

9 new files uploaded to s3://ami-models/moths/classification/:

File Size Status
turing-japan_v01_resnet50_2024-11-22-17-22_state.pt 287.3 MB NEW
01_japan_data_category_map.json 174.9 KB NEW
turing-madagascar_v01_resnet50_2024-07-01-13-01_state.pt 267.9 MB NEW
01_madagascar_data_category_map.json 139.2 KB NEW
02_madagascar_data_category_map.json 99.8 KB NEW
turing-singapore_v02_resnet50_2024-11-21-19-58_state.pt 209.3 MB NEW
turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt 250.7 MB NEW
01_thailand_data_category_map.json 108.1 KB NEW
turing-anguilla_v02_resnet50_2024-11-19-19-17_state.pt 191.1 MB NEW (v02, 160 classes)
02_anguilla_data_category_map_160cls.json 4.2 KB NEW (v02, 160 classes)

All URLs verified returning HTTP 200. Existing files were not overwritten.

Anguilla models

Two versions now available:

  • v01 (existing): 79 classes, trained June 2024 — pipeline slug anguilla_moths_turing_2024
  • v02 (new): 160 classes, trained Nov 2024 — pipeline slug anguilla_moths_turing_v02_2024

Note: missing Singapore category map

02_singapore_data_category_map.json was not in the original archive from UKCEH. The code references it but the file does not exist in the object store. The Singapore pipeline is disabled until this is resolved.

Supersedes #72.

Test plan

  • Verify each new pipeline loads and runs inference
  • Verify anguilla v01 still works unchanged
  • Verify anguilla v02 (160cls) works with new weights
  • Follow up on missing Singapore category map
  • Import species from each new category map into Antenna and match to parent genera/ancestors (without this, predictions will appear as species names but will have no taxonomic rank or parent taxa in the Antenna UI)

Summary by CodeRabbit

  • New Features
    • Added support for five new regional moth classifier variants: Anguilla v02, Japan, Madagascar, Thailand, and Singapore. Users can now select these additional classifiers for moth species identification and analysis.

KatrionaGoldmann and others added 8 commits January 27, 2025 15:25
Resolves merge conflicts and adds 5 new Turing regional species
classifiers: Japan, Kenya-Uganda, Madagascar, Singapore, Thailand.

Also fixes duplicate TuringKenyaUgandaSpeciesClassifier class and
double slashes in object store URLs from the upstream branch.

NOTE: Only kenya-uganda model weights and labels exist in the object
store. The following 4 models have 404 URLs and are NOT yet usable:
- turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt
- turing-madagascar_v01_resnet50_2024-07-01-13-01_state.pt
- turing-japan_v01_resnet50_2024-11-22-17-22_state.pt
- turing-singapore_v02_resnet50_2024-11-21-19-58_state.pt
Corresponding category_map JSON files are also missing.
Contact kgoldmann@turing.ac.uk to upload the weights.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

Five new Turing-based moth species classifiers were added to the ML layer and exposed via API wrapper classes; three of those (Anguilla v02, Japan, Madagascar) were registered as selectable classifiers in the API, while Singapore and Thailand wrappers were added but not registered/commented.

Changes

Cohort / File(s) Summary
ML Model Definitions
trapdata/ml/models/classification.py
Adds five Turing-based species classifier classes: TuringAnguillaV02SpeciesClassifier, TuringJapanSpeciesClassifier, TuringMadagascarSpeciesClassifier, TuringSingaporeSpeciesClassifier, and TuringThailandSpeciesClassifier with metadata (name, description, weights_path, labels_path).
API Model Wrappers
trapdata/api/models/classification.py
Introduces API wrapper classes: MothClassifierTuringAnguillaV02, MothClassifierTuringJapan, MothClassifierTuringMadagascar, MothClassifierTuringSingapore, and MothClassifierTuringThailand (subclassing APIMothClassifier + respective ML classifier). Exports updated to include these symbols; a note/comment indicates Singapore category-map unavailability.
API Endpoint Registration
trapdata/api/api.py
Imports new wrapper classes and extends CLASSIFIER_CHOICES and lifespan/ready listings to include keys: anguilla_moths_turing_v02_2024, japan_moths_turing_2024, madagascar_moths_turing_2024. Singapore and Thailand pipeline keys are present in comments/remarks and are not registered as active choices.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through folders, sniffed each new byte,
Five moths took wing in morning light,
Anguilla, Japan, Madagascar in flight,
Singapore and Thailand tucked out of sight,
I munch a carrot and cheer — models take flight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Merge UKCEH/Turing regional classifiers (#72)' accurately describes the main change: adding multiple new Turing-based regional moth classifier pipelines (Japan, Madagascar, Anguilla v02, etc.) from the UKCEH merge.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ukceh-merge-pr72

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
trapdata/api/api.py (1)

46-50: ⚠️ Potential issue | 🔴 Critical

Service startup will fail if any classifier has a 404 labels_path URL.

initialize_service_info() instantiates all classifiers in CLASSIFIER_CHOICES without error handling. During instantiation, each classifier's __init__ calls get_labels(labels_path), which invokes get_or_download_file(). This function calls requests.get() and raises HTTPError for 404s or other HTTP errors (via response.raise_for_status() at line 107 in trapdata/ml/utils.py). Any unhandled exception propagates through lifespan(), causing FastAPI to abort startup entirely.

Since the PR description acknowledges four of the five newly added classifiers have 404 labels_path URLs, either upload the missing label files first or wrap initialize_service_info() in a try/except block to prevent a single failing classifier from bringing down the entire service.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/api/api.py` around lines 46 - 50, The startup currently calls
initialize_service_info() inside lifespan() without error handling so any
HTTPError from get_or_download_file()/get_labels() during classifier
instantiation will abort FastAPI startup; wrap the initialization in a
try/except in lifespan() (catch Exception or requests.HTTPError) and on failure
log the full error (use logger.exception or include the exception message) and
set app.state.service_info to a safe default (e.g., empty dict or partial
result) so the service can start; alternatively, move try/except into
initialize_service_info() around each classifier construction so failing
classifiers are skipped but others are initialized.
🧹 Nitpick comments (1)
trapdata/ml/models/classification.py (1)

441-442: URL string style is inconsistent with the rest of the file.

All other classifiers in this file use parenthesized multi-line implicit string concatenation for weights_path / labels_path. The new four classifiers use single-line strings. Consider aligning for consistency.

♻️ Proposed style alignment (Thailand shown; apply same pattern to the other three)
 class TuringThailandSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
     name = "Turing Thailand Species Classifier"
     description = "Trained on 21st November 2024 by Turing team using Resnet50 model."
-    weights_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
-    labels_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/01_thailand_data_category_map.json"
+    weights_path = (
+        "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/"
+        "turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
+    )
+    labels_path = (
+        "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/"
+        "01_thailand_data_category_map.json"
+    )

Also applies to: 448-449, 455-456, 462-463

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/ml/models/classification.py` around lines 441 - 442, The new
classifier assignments use single-line string literals for weights_path and
labels_path which is inconsistent with the rest of the file; update the four new
classifiers to use the same parenthesized multi-line implicit string
concatenation style as the other classifiers by replacing the single-line
weights_path and labels_path values with parenthesized, split strings (keeping
the exact URLs intact) for the variables named weights_path and labels_path in
the relevant classifier blocks in classification.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@trapdata/ml/models/classification.py`:
- Around line 440-441: The description strings for the classifier classes
TuringThailandSpeciesClassifier, TuringMadagascarSpeciesClassifier, and
TuringJapanSpeciesClassifier are inconsistent with the dates encoded in their
weights_path filenames; update each class's description field to match the date
in its weights_path (e.g., use 2024-11-21 for the Thailand model, 2024-07-01 for
the Madagascar model, and 2024-11-22 for the Japan model), and confirm with the
Turing team before committing to ensure the corrected human-readable dates are
accurate.

---

Outside diff comments:
In `@trapdata/api/api.py`:
- Around line 46-50: The startup currently calls initialize_service_info()
inside lifespan() without error handling so any HTTPError from
get_or_download_file()/get_labels() during classifier instantiation will abort
FastAPI startup; wrap the initialization in a try/except in lifespan() (catch
Exception or requests.HTTPError) and on failure log the full error (use
logger.exception or include the exception message) and set
app.state.service_info to a safe default (e.g., empty dict or partial result) so
the service can start; alternatively, move try/except into
initialize_service_info() around each classifier construction so failing
classifiers are skipped but others are initialized.

---

Nitpick comments:
In `@trapdata/ml/models/classification.py`:
- Around line 441-442: The new classifier assignments use single-line string
literals for weights_path and labels_path which is inconsistent with the rest of
the file; update the four new classifiers to use the same parenthesized
multi-line implicit string concatenation style as the other classifiers by
replacing the single-line weights_path and labels_path values with
parenthesized, split strings (keeping the exact URLs intact) for the variables
named weights_path and labels_path in the relevant classifier blocks in
classification.py.

Comment on lines +440 to +441
description = "Trained on 11th November 2024 by Turing team using Resnet50 model."
weights_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inaccurate training dates in descriptions — Madagascar is significantly wrong.

Cross-referencing the description strings against the dates encoded in the weights_path filenames reveals mismatches:

Class Description date Filename date
TuringThailandSpeciesClassifier "11th November 2024" 2024-11-21 (21st Nov)
TuringMadagascarSpeciesClassifier "11th November 2024" 2024-07-01 (1st July 2024)
TuringJapanSpeciesClassifier "19th November 2024" 2024-11-22 (22nd Nov)

The Madagascar discrepancy is the most critical — the description is ~4 months off and contradicts the filename. Please verify with the Turing team and correct before merging.

📝 Proposed corrections
 class TuringThailandSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
     name = "Turing Thailand Species Classifier"
-    description = "Trained on 11th November 2024 by Turing team using Resnet50 model."
+    description = "Trained on 21st November 2024 by Turing team using Resnet50 model."

 class TuringMadagascarSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
     name = "Turing Madagascar Species Classifier"
-    description = "Trained on 11th November 2024 by Turing team using Resnet50 model."
+    description = "Trained on 1st July 2024 by Turing team using Resnet50 model."

 class TuringJapanSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
     name = "Turing Japan Species Classifier"
-    description = "Trained on 19th November 2024 by Turing team using Resnet50 model."
+    description = "Trained on 22nd November 2024 by Turing team using Resnet50 model."

Also applies to: 447-448, 454-455

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/ml/models/classification.py` around lines 440 - 441, The description
strings for the classifier classes TuringThailandSpeciesClassifier,
TuringMadagascarSpeciesClassifier, and TuringJapanSpeciesClassifier are
inconsistent with the dates encoded in their weights_path filenames; update each
class's description field to match the date in its weights_path (e.g., use
2024-11-21 for the Thailand model, 2024-07-01 for the Madagascar model, and
2024-11-22 for the Japan model), and confirm with the Turing team before
committing to ensure the corrected human-readable dates are accurate.

Upload 9 new files to the Arbutus object store:
- Weights: japan, madagascar, singapore, thailand, anguilla v02
- Category maps: japan, madagascar (x2), thailand, anguilla 160cls

Add TuringAnguillaV02SpeciesClassifier (160 classes, Nov 2024)
as a separate pipeline alongside the existing v01 (79 classes).

All URLs verified returning HTTP 200.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mihow mihow mentioned this pull request Feb 17, 2026
The 02_singapore_data_category_map.json file was not included in
the original UKCEH archive. Weights are uploaded but the pipeline
is commented out of CLASSIFIER_CHOICES until the category map is
sourced and uploaded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
trapdata/ml/models/classification.py (1)

454-476: Inconsistent URL string formatting with the rest of the file.

The four new classifiers use single-line URLs while every other classifier in this file uses parenthesized multi-line string concatenation. Consider normalizing for readability.

♻️ Proposed style normalization (Thailand shown; apply the same pattern to Madagascar, Japan, Singapore)
-    weights_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
-    labels_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/01_thailand_data_category_map.json"
+    weights_path = (
+        "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/"
+        "turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
+    )
+    labels_path = (
+        "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/"
+        "01_thailand_data_category_map.json"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/ml/models/classification.py` around lines 454 - 476, The
weights_path and labels_path strings in TuringThailandSpeciesClassifier,
TuringMadagascarSpeciesClassifier, TuringJapanSpeciesClassifier, and
TuringSingaporeSpeciesClassifier are single-line URLs while the rest of the file
uses parenthesized multi-line string concatenation for long URLs; update the
weights_path and labels_path assignments in those classes (refer to the class
symbols TuringThailandSpeciesClassifier, TuringMadagascarSpeciesClassifier,
TuringJapanSpeciesClassifier, TuringSingaporeSpeciesClassifier and the
attributes weights_path and labels_path) to use the same parenthesized
multi-line string concatenation style used elsewhere for consistency and
readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@trapdata/ml/models/classification.py`:
- Around line 451-476: Four classifiers (TuringThailandSpeciesClassifier,
TuringMadagascarSpeciesClassifier, TuringJapanSpeciesClassifier,
TuringSingaporeSpeciesClassifier) reference weights_path/labels_path URLs that
return 404 and will fail during the base-class download step; either update each
class's weights_path and labels_path to valid, accessible URLs (ensure the
verification script returns HTTP 200 for both files) or remove/disable their
registration in the API (trapdata/api/api.py) until the assets are
uploaded—locate the classes above and replace the broken URLs with the correct
object-arbutus links or comment out/unregister these class definitions so the
API does not advertise non-functional models.

---

Duplicate comments:
In `@trapdata/ml/models/classification.py`:
- Line 453: The description strings for some trained models are incorrect;
update the relevant description assignments (the description variable in the
country-specific model definitions) to match their filenames: for Thailand
change "Trained on 11th November 2024 by Turing team using Resnet50 model." to
"Trained on 21st November 2024 by Turing team using Resnet50 model.", for
Madagascar change "Trained on 11th November 2024 by Turing team using Resnet50
model." to "Trained on 1st July 2024 by Turing team using Resnet50 model.", and
for Japan change "Trained on 19th November 2024 by Turing team using Resnet50
model." to "Trained on 22nd November 2024 by Turing team using Resnet50 model.";
ensure these exact strings replace the current description assignments so the
dates now match their filenames.

---

Nitpick comments:
In `@trapdata/ml/models/classification.py`:
- Around line 454-476: The weights_path and labels_path strings in
TuringThailandSpeciesClassifier, TuringMadagascarSpeciesClassifier,
TuringJapanSpeciesClassifier, and TuringSingaporeSpeciesClassifier are
single-line URLs while the rest of the file uses parenthesized multi-line string
concatenation for long URLs; update the weights_path and labels_path assignments
in those classes (refer to the class symbols TuringThailandSpeciesClassifier,
TuringMadagascarSpeciesClassifier, TuringJapanSpeciesClassifier,
TuringSingaporeSpeciesClassifier and the attributes weights_path and
labels_path) to use the same parenthesized multi-line string concatenation style
used elsewhere for consistency and readability.

The Thailand category map has 3822 classes but the model weights only
have 3800 output neurons, causing a RuntimeError at model load time.
This also crashes the /info endpoint since it initializes all classifiers
at startup. Disabling until corrected data is available from UKCEH.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@trapdata/api/api.py`:
- Line 16: Remove the unused "noqa: E501" suppression on the import line
starting with "from .models.classification import (" (RUF100) and convert the
explanatory text that follows it ("disabled: Singapore - no category map,
Thailand - map/weights mismatch") into a regular inline or above-line Python
comment so the rationale is preserved but the dead noqa directive is eliminated.

In `@trapdata/api/models/classification.py`:
- Around line 249-250: Add a disable comment above the
MothClassifierTuringThailand class like the one used for
MothClassifierTuringSingapore to make its disabled state explicit and prevent
surprise errors; specifically note that this classifier is disabled because the
category map is not available and that attempting to instantiate it will raise a
RuntimeError in model.load_state_dict() due to a class-count mismatch (3822 vs.
3800). Place the comment immediately before the MothClassifierTuringThailand
class definition to mirror the existing pattern used for
MothClassifierTuringSingapore.

In `@trapdata/ml/models/classification.py`:
- Around line 451-455: Add a 3-line "# NOTE:" comment block above the
TuringThailandSpeciesClassifier declaration (like the one above
TuringSingaporeSpeciesClassifier) explaining that this classifier is
intentionally disabled because the labels file contains 3822 classes while the
Resnet50 model checkpoint has 3800 output neurons which causes
model.load_state_dict() to raise a size-mismatch RuntimeError; instruct
maintainers to not instantiate this class until the weights or labels are
corrected (or the checkpoint is replaced) and reference the class name
TuringThailandSpeciesClassifier and the failing call model.load_state_dict for
clarity.

---

Duplicate comments:
In `@trapdata/ml/models/classification.py`:
- Around line 453-467: Update the description strings to match the dates
embedded in each class's weights filename: change
TuringThailandSpeciesClassifier.description to use "21st November 2024" to match
the file date 2024-11-21, change TuringMadagascarSpeciesClassifier.description
to use "1st July 2024" to match 2024-07-01, and change
TuringJapanSpeciesClassifier.description to use "22nd November 2024" to match
2024-11-22; ensure the same wording/format ("Trained on <ordinal> <Month> <Year>
by Turing team using Resnet50 model.") is preserved for each class
(TuringThailandSpeciesClassifier, TuringMadagascarSpeciesClassifier,
TuringJapanSpeciesClassifier).

from ..common.logs import logger # noqa: F401
from . import settings
from .models.classification import (
from .models.classification import ( # noqa: E501 (disabled: Singapore - no category map, Thailand - map/weights mismatch)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the unused noqa: E501 directive flagged by Ruff (RUF100).

E501 is not enabled in the linter config, so the suppression comment is dead. The explanatory text after it is useful and should be kept as a plain comment.

♻️ Proposed fix
-from .models.classification import (  # noqa: E501 (disabled: Singapore - no category map, Thailand - map/weights mismatch)
+from .models.classification import (  # disabled: Singapore - no category map, Thailand - map/weights mismatch
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from .models.classification import ( # noqa: E501 (disabled: Singapore - no category map, Thailand - map/weights mismatch)
from .models.classification import ( # disabled: Singapore - no category map, Thailand - map/weights mismatch
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 16-16: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/api/api.py` at line 16, Remove the unused "noqa: E501" suppression
on the import line starting with "from .models.classification import (" (RUF100)
and convert the explanatory text that follows it ("disabled: Singapore - no
category map, Thailand - map/weights mismatch") into a regular inline or
above-line Python comment so the rationale is preserved but the dead noqa
directive is eliminated.

Comment on lines +249 to +250
class MothClassifierTuringThailand(APIMothClassifier, TuringThailandSpeciesClassifier):
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

MothClassifierTuringThailand is missing a disable comment — inconsistent with MothClassifierTuringSingapore.

Line 253 has # Disabled: category map not available at this time before MothClassifierTuringSingapore. MothClassifierTuringThailand has no equivalent notice, even though it's also disabled and — more critically — will raise a RuntimeError on model.load_state_dict() due to the 3822 vs. 3800 class-count mismatch if anyone instantiates it directly.

♻️ Suggested comment
+# Disabled: model weights have 3800 output neurons but category map lists 3822 classes;
+# state_dict load will fail with a size-mismatch RuntimeError until corrected data is available.
 class MothClassifierTuringThailand(APIMothClassifier, TuringThailandSpeciesClassifier):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class MothClassifierTuringThailand(APIMothClassifier, TuringThailandSpeciesClassifier):
pass
# Disabled: model weights have 3800 output neurons but category map lists 3822 classes;
# state_dict load will fail with a size-mismatch RuntimeError until corrected data is available.
class MothClassifierTuringThailand(APIMothClassifier, TuringThailandSpeciesClassifier):
pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/api/models/classification.py` around lines 249 - 250, Add a disable
comment above the MothClassifierTuringThailand class like the one used for
MothClassifierTuringSingapore to make its disabled state explicit and prevent
surprise errors; specifically note that this classifier is disabled because the
category map is not available and that attempting to instantiate it will raise a
RuntimeError in model.load_state_dict() due to a class-count mismatch (3822 vs.
3800). Place the comment immediately before the MothClassifierTuringThailand
class definition to mirror the existing pattern used for
MothClassifierTuringSingapore.

Comment on lines +451 to +455
class TuringThailandSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
name = "Turing Thailand Species Classifier"
description = "Trained on 11th November 2024 by Turing team using Resnet50 model."
weights_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
labels_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/01_thailand_data_category_map.json"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Thailand class is missing a disable note — unlike Singapore.

TuringSingaporeSpeciesClassifier has a clear 3-line # NOTE: block explaining why it's disabled and what's missing (lines 472–474). TuringThailandSpeciesClassifier has no equivalent, despite also being disabled — and for a more dangerous reason: the category map has 3822 classes but the model outputs 3800 neurons, so model.load_state_dict() will throw a size-mismatch RuntimeError on any attempt to instantiate this class.

🛡️ Suggested comment block
+# NOTE: Thailand model has a class-count mismatch: category map lists 3822 classes
+# but the model was trained with 3800 output neurons. Loading state_dict will fail
+# with a size-mismatch RuntimeError. Disabled until corrected weights or map is provided.
 class TuringThailandSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class TuringThailandSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
name = "Turing Thailand Species Classifier"
description = "Trained on 11th November 2024 by Turing team using Resnet50 model."
weights_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
labels_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/01_thailand_data_category_map.json"
# NOTE: Thailand model has a class-count mismatch: category map lists 3822 classes
# but the model was trained with 3800 output neurons. Loading state_dict will fail
# with a size-mismatch RuntimeError. Disabled until corrected weights or map is provided.
class TuringThailandSpeciesClassifier(SpeciesClassifier, Resnet50Classifier_Turing):
name = "Turing Thailand Species Classifier"
description = "Trained on 11th November 2024 by Turing team using Resnet50 model."
weights_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/turing-thailand_v01_resnet50_2024-11-21-16-28_state.pt"
labels_path = "https://object-arbutus.cloud.computecanada.ca/ami-models/moths/classification/01_thailand_data_category_map.json"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@trapdata/ml/models/classification.py` around lines 451 - 455, Add a 3-line "#
NOTE:" comment block above the TuringThailandSpeciesClassifier declaration (like
the one above TuringSingaporeSpeciesClassifier) explaining that this classifier
is intentionally disabled because the labels file contains 3822 classes while
the Resnet50 model checkpoint has 3800 output neurons which causes
model.load_state_dict() to raise a size-mismatch RuntimeError; instruct
maintainers to not instantiate this class until the weights or labels are
corrected (or the checkpoint is replaced) and reference the class name
TuringThailandSpeciesClassifier and the failing call model.load_state_dict for
clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments